check_emc_clariion.pl maintained by Box293
2014-05-06
Copyright (c) 2012 Troy Lea aka Box293
plugins@box293.com
Twitter: @Box293

This plugin allows you to monitor an EMC CLARiiON SAN.

You can monitor the following components of the SAN:
* Storage Processors = Status of each SP
* Storage Processors Information = Gets information on the SP (SP ID, Agent Revision, FLARE Revision, PROM Revision, Model/Type, Memory and Serial Number
* Storage Processors Busy Percentage = SP Busy % with performance data for graphing purposes
* Disks = Status of the Physical Disks attached in all the Disk Array Enclosures
* Cache = Status of the Read and Write Cache
* Faults = Report any Faults on the SAN
* Percentage Dirty Pages in Cache = % Dirty Pages in Cache with Performance Data for graphing purposes
* Port State = Status of the Ports on an SP
* HBA State = Status of a client's host bust adapter connection
* LUNs = Check the status of a specific LUN and reports State, ID, Name, Size, Free Space, RAID Group Type and Percentage Rebuilt
* RAID Group = Checks the status of a specific RAID Group and reports the State, ID, RAID Group Type, Logical Size, Free Space, Percentage Defragmentation Complete and Percentage Expansion Complete
* Storage Pool = Returns capacity usage information of the Storage Pool and reports the State, ID, RAID Type, Available Capacity, Consumed Capacity, Subscribed Capacity, Percentage Used and Percentage Free
* Temperature = Gets the inlet air temperature and returns Performance Data for graphing purposes

The monitoring of the EMC CLARiiON is performed by a MODIFIED version of the check_emc_clariion.pl script written by Michael Streb @ NETWAYS GmbH.
However I have made many changes to this script since so I have decided it best to release this as an alternate version. The version notes below highlight the changes that I have implemented.

Requirements:
There are a couple of components used that make all of this work.
EMC Linux Navisphere Server Software
* This is the software that communicates with the SAN and is what the plugin uses
Enable SNMP on the SAN
* Navisphere uses this to talk to the SAN
Create a Monitoring Account on the SAN
* For security reasons it's best to create a read only account for Navisphere to use

Download Linux Navisphere Server Software.
* Go here http://powerlink.emc.com
* Access to the Powerlink website requires you to have an account with EMC
* Usually this is provided as part of your support contract with EMC
* Once logged in navigate your way to:
* Support > Software Downloads and Licensing - Downloads J-O - Navisphere Server Software
* Find the section Linux Navisphere Server Software
* You need to download Navisphere Host Agent/CLI (Linux)
* For example: NaviCLI-Linux-64-x86-en_US-7.31.33.0.41-1.x86_64.rpm
* NOTE: There is a 32-bit and 64-bit version, the example above is the 64-bit version.
* If you can't find this download or section then you need to contact EMC as your account will only have access to downloads that your account is registered for.

Installation of Linux Navisphere Server Software.
* Save NaviCLI-Linux-64-x86-en_US-7.31.33.0.41-1.x86_64.rpm to /tmp on your Nagios host.
* Then run:
* yum install NaviCLI-Linux-64-x86-en_US-7.31.33.0.41-1.x86_64.rpm
* This will install the software and any required dependancies.

Enable SNMP on the EMC SAN
* This is done using the Navisphere web console
* Open a web browser to one of the SANs SP IP Address
* Login as an administrator
* Expand the tree of your SAN and select one of the SPs
* Right click the SP and select Properties
* Click the Network tab
* Tick the box Enable/Disable processing of SNMP MIB read requests
* Click Apply and then OK
* Repeat this step for each SP in your SAN
* You can leave Navisphere open as we will use it with the next step

Create a Monitoring Account
* Continuing with your Navisphere web console session
* Click the pull down menu Tools and select Security - User Management...
* Click the Add button
* Username: readonly
* Role: monitor
* Global/Local: global
* Password: type a strong password
* Click OK
* Click Yes to add the new user
* Click OK and then OK again

This completes all the steps required for the plugin to work.

Additionally you can use the --secfilepath option that allows you to use a directory that has the security credentials encrypted in some files. You will need to make a directory first to store the files, and then run a command to create the security files in this directory.

* The following example will create the directory /usr/local/nagios/libexec/check_emc_clariion_security_files for storing the security files.
* You will need to change the username and password to match the credentials you use to connect to the emc storage processor.
* Run these commands to create the security files:
* mkdir /usr/local/nagios/libexec/check_emc_clariion_security_files
* /opt/Navisphere/bin/naviseccli -secfilepath /usr/local/nagios/libexec/check_emc_clariion_security_files -User readonly -Password AStrongPassword -Scope 0 -AddUserSecurity


Command Line Examples:
Status of All Disks 
check_emc_clariion.pl -H 192.168.5.1 -u readonly -p AStrongPassword -t disk
check_emc_clariion.pl -H 192.168.5.1 --secfilepath /usr/local/nagios/libexec/check_emc_clariion_security_files -t disk

Status of Any Faults
check_emc_clariion.pl -H 192.168.5.1 -u readonly -p AStrongPassword -t faults
check_emc_clariion.pl -H 192.168.5.1 --secfilepath /usr/local/nagios/libexec/check_emc_clariion_security_files -t faults

SPA Percentage Busy
check_emc_clariion.pl -H 192.168.5.1 -u readonly -p AStrongPassword -t sp_cbt_busy --sp A --warn 50 --crit 70
check_emc_clariion.pl -H 192.168.5.1 --secfilepath /usr/local/nagios/libexec/check_emc_clariion_security_files -t sp_cbt_busy --sp A --warn 50 --crit 70


Setup Examples:

define command {
	command_name check_emc_clariion
	command_line $USER1$/check_emc_clariion.pl -H $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$ $ARG8$
	}


SPA Percentage Busy
define service {
	use 	generic-service
	host_name 				EMC_SAN
	service_description 	SPA Percentage Busy
	check_command 			check_emc_clariion!$HOSTADDRESS$!-u readonly!-p AStrongPassword!-t sp_cbt_busy!--sp A!--warn 50!--crit 70
	max_check_attempts		3
	check_interval			3
	retry_interval			3
	register				1
	}


define service {
	use 	generic-service
	host_name 				EMC_SAN
	service_description 	SPA Percentage Busy
	check_command 			check_emc_clariion!$HOSTADDRESS$!--secfilepath /usr/local/nagios/libexec/check_emc_clariion_security_files!-t sp_cbt_busy!--sp A!--warn 50!--crit 70
	max_check_attempts		3
	check_interval			3
	retry_interval			3
	register				1
	}

	
License:
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program.  If not, see http://www.gnu.org/licenses/.

Help:
To see the help type:
	check_emc_clariion.pl --help | more

License:
To see the license type:
	check_emc_clariion.pl --license | more


Version Notes:
2011-03-24
* Modified plugin to perform a Percentage Of Dirty Pages check that returns performance data (cache_pdp).

2011-03-28
* Modified plugin to perform SP Busy and SP Idle checks that returns performance data (sp_busy and sp_idle).

2011-05-08
* Modified plugin to peform an SP Busy check [that uses the controller busy and idle ticks] and returns performance data (sp_cbt_busy). This is more accurate than the (sp_busy and sp_idle) method.

2011-06-29
* Modified sp_cbt_busy to check for negative numbers in the data obtained from the SAN.

2011-07-04
* Modified sp_cbt_busy to ensure calculated value does not exceed 100%.

2012-03-06
* Modified check_disk to look for Removed drives, this was missing. Also removed a double || symbol in the same section.
* Modified check_portstate to include code supplied from Federspiel Till. Problem occurred when all ports were checked, the error_count was being incorrectly determined.

2012-10-23
* Corrected POD formatting to fix POD ERRRORS.
* Added error checking to ensure we are getting expected results from the Navisphere CLI app.

2012-12-05
* Updated plugin to check for navicli or naviseccli, it will use naviseccli if present. Also makes sure that the username and password arguments have been provided. This fixes a problem with newer releases of Navisphere that only come with naviseccli (reported by Charles Breite).

2012-12-11
* Fixed bug in portstate check, it is now performing a regex that is not case sensative (reported by Charles Breite). 
* Added a check to detect if the user did not provide any options, if not it will display the help. 

2012-12-19
* Updated code to prevent errors if required arguments are missing, if so it will display the help.
* Updated the help to include information about Secure vs Non-Secure and also provided several examples.

2013-01-25
* Added Storage Processors Information check 
* Added LUNs check
* Added RAID Groups check
* Added functionality that will pause for 7 seconds if an error occurs before showing the help text, this gives	you time to read the error message
* Added error checking for "Could not connect to the specified host"
* Updated SP check to account for enclosures which return certain parts (Fans etc) with a status of N/A
* Fixed bug in the disk check that was counting Empty disk slots as disks
* Fixed bug "Illegal division by zero" error when running the sp_cbt_busy check
* Added information to the Help about what states will be returned for each check
* If warn or crit values are incorrect or not present when the arguments are, only an error is displayed, the help is not displayed
* Added full GNU license

2013-01-28
* Fixed RAID Group type being identified as hot_spare instead of 'Hot Spare'

2013-01-30
* Removed space from performance data string for LUN and RAID Group checks

2013-02-09
* Fixed bug in LUN and RAID Group checks, they were not triggering correctly on the warning and critical thresholds
* Updated the help to explain how the warning and critical thresholds are triggered as the existing help was not very clear

2013-03-09
* Plugin updated to incorporate new functionality of using a credentials file instead of supplying a username and password. This code was supplied by Uwe Kirbach
* Fixed a bug that was caused by older versions of perl and the use of switch statements. Changed these switch statements to if elsif statements to allow plugin to run on older versions of perl

2013-12-19
* Added an option to set min spare disk expected (--minspare <COUNT>) [updated plugin supplied by Yannig Perre]

2014-05-06
* Added a check to get the inlet air temperature as a nagios perf metric (Contributed by Max Vernimmen from www.comparegroup.eu
* Fixed duplicate port bug when checking just one port. Can now check several specific ports at one time like --port 1,3. (Port fixes contributed by Stanislav German-Evtushenko)
* Added a check for reporting on Storage Pools (requested and tested by Vitaly Burshteyn, tested by Stanislav German-Evtushenko)

NOTE: I no longer have access to an EMC CLARiiON and hence I can no longer continue to work on this project. Hence this project will need a new owner, anyone wanting to take it on please email me.
